Previous Book Contents Book Index Next

Inside Macintosh: Programming With the Text Encoding Conversion Manager /
Appendix B - Character Encodings Concepts


Non-Unicode Character Encodings

Most of these encodings are designed to support one writing system, or a group of writing systems that use the same script. As a result, in some cases certain encodings are treated as implying a particular language, which is information that should be several layers higher in the architectural model described previously in this appendix.

Appendix C provides a more complete list of character encodings (but with less explanatory material), grouped by the writing systems they cover.

General Character Set Structure

ISO 2022 and ISO 4873 define a structure for coded character sets using 7-bit or 8-bit values. These coded character sets provide a means of representing both graphic characters and control functions; control functions that can be represented with a single code point are also called control characters.

For character sets using 7-bit values, the range 0x00-0x1F is reserved for a set of 32 control characters, designated C0; another set of 32 control functions, designated C1, may be represented with escape sequences. The range 0x20-0x7F (96 code points) is reserved for up to four sets of graphic characters, designated G0-G3 (in some graphic sets, each code point requires two or three 7-bit values). Most Gn sets use only the 94 code points 0x21-0x7E, in which case 0x20 is reserved for SPACE, and 0x7F is reserved for DELETE. ISO 2022 specifies a protocol for

For 8-bit character sets, the C0 set uses 0x00-0x1F, but the C1 set uses 0x80-0x9F. The G0 set uses 0x21-0x7E (with SPACE and DELETE reserved), but the G1, G2, and G3 sets share the range 0xA0-0xFF (96 code points). Figure B-3 shows these differences.

Figure B-3 Comparison of 7-bit and 8-bit character set structures

The G0 set is typically the ISO 646 international reference version (ASCII). The C0 and C1 control functions are typically from ISO 6429, although other control sets can be used.

Simple Coded Character Sets

All of these use a fixed number of 7-bit or 8-bit values to represent the code point. Here are some examples for different code point sizes.


Subtopics
B - General Character Set Structure
B - Simple Coded Character Sets

Previous Book Contents Book Index Next

© Apple Computer, Inc.
13 NOV 1997